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1.0  Introduction 

Support  Vector  Machine  (SVM)  theory  is  a  learning  machine  theory  developed  by  V.  Vapnik.  Its  most 
common  uses  are  for  classification  problems  and  regression.  Like  other  learning  machines,  the  distribution 
of  the  population  does  not  need  to  be  known.  It  is  sufficient  only  to  know  that  a  distribution  exists.  What 
sets  SVMs  apart  from  other  learning  machines  is  its  ability  to  classify  items  correctly  with  a  relatively 
small  sample  size. 

In  this  paper  I  will  briefly  describe  learning  machines  and  SVMs.  It  will  not  be  a  complete  tutorial  on 
either  subject.  If  the  reader  desires  to  learn  more  about  them,  then  please  refer  to  the  references  at  the  end 
of  this  paper.  I  will  also  give  the  theory  and  results  of  using  SVMs  to  classify  whether  there  is  a  car  in 
front  of  you  or  not,  using  an  image  from  a  digital  camera. 

2.0  Learning  Machines 

A  learning  machine  is  a  process  for  creating  an  optimal  algorithm  that  chooses  one  class  over  another  based 
on  a  set  of  training  data.  Vapnik  picturizes  a  learning  machine  similar  to  the  one  in  figure  2.1. 


y 
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Figure  2.1 


Here,  G  is  the  generator  of  the  data.  It  is  responsible  for  the  distribution  (but  the  distribution  is  unknown). 

S  is  the  supervisor  operator.  It  separates  the  data  into  classes.  The  LM  is  the  learning  machine.  It  takes  in 
the  data  that  was  generated  by  x  and  what  class  it  belongs  to.  From  this  information  it  creates  an  algorithm 
that  is  supposed  to  represent  the  real  population  classification.  The  output  of  the  LM  is  calculation  from 
the  algorithm  of  what  class  the  data  belongs  to.  It  is  the  hope  that  the  learning  machine’s  output  and  the 
supervisor’s  output  are  equal.  The  error  is  based  on  the  differences  ofy  andy  bar.  In  other  words,  the  LM 
is  supposed  to  perform  the  following  mapping: 


x,. 


Now,  we  can  consider  y;  a  function  of  x,  and  a  where  a  is  an  adjustable  parameter  that  describes  a  function 
in  the  set  of  all  possible  functions  (see  equation  2.1) 


The  mathematical  expectation  of  the  test  error  is  defined  as: 


R(a)=  \^y-f^y\dp^y)  (eq2'2) 


where  P(jcj’)  is  the  joint  probability  distribution. 


The  goal  of  the  learning  machine  is  to  minimize  R(a).  Since  we  can  not  determine  the  value  of  R(a)  (we 
don’t  know  the  distribution  of  the  population),  we  must  bound  R(cc)  and  minimize  the  bound.  We 
determine  the  empirical  risk  as: 


(eq  2.3) 


/  is  the  number  of  observations.  We  can  now  define  the  bounds  of  the  risk  (see  equation  2.4). 
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(eq  2.4) 


Where  the  <j>(...)  is  the  VC  confidence,  h  is  the  VC  dimension,  and  q  is  chosen  so  that  0<q<l  and  1-q 
represents  the  confidence  in  the  empirical  risk.  The  VC  dimension  is  defined  as  the  maximum  number  of 
points  that  can  be  shattered  by  all  possible  equations.  The  VC  confidence  can  be  found  in  equation  2.5. 
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We  now  have  a  bound  for  the  risk,  R(a),  which  can  be  computed. 


(eq  2.5) 


3.1  Support  Vector  Machines 

The  idea  behind  many  classification  problems  is  finding  separating  curves  that  divide  data  between  their 
classes,  support  vector  machines  divide  the  data  between  two  classes.  The  value  for  one  class  is 
represented  as  1  (y  =  1  in  figure  2.1)  and  the  value  of  the  other  class  is  represented  as  -1 . 


3.2  Linear  Separable  Case 

Figure  3.1  shows  a  linear  separable  set  of  data  between  two  classes.  Let’s  call  class  1  the  set  where  the  data 
points  (the  *s)  are  to  the  upper  right  of  the  separating  line.  Call  the  data  points  (the  Xs)  to  the  lower  left  of 
the  separating  line  class  -1 . 


Our  goal  is  to  maximize  the  distance  between  the  separating  line  and  the  each  data  set.  The  following  steps 
accomplish  this: 

1 . )  Draw  a  convex  hull  around  each  data  set. 

2. )  Calculate  the  distance  between  each  point  in  one  class  and  each  side  of  the  convex  hull  in  the 
opposing  class. 

3. )  Choose  the  line,  which  makes  up  the  smallest  distance  in  part  2.  This  line  will  be  called  the 
‘Minimum  Distance  Line’  or  MDL. 

4. )  Calculate  a  line  that  intersects  the  midpoint  of  the  MDL  and  is  parallel  to  the  side  of  the  hull 
used  for  calculating  the  MDL.  This  line  is  the  separating  line. 

figure  3.2  shows  this  process. 


Figure  3.2 


Notice  that  there  are  three  points  (in  this  case)  that  determine  the  separating  line:  1  from  class  1  and  2  from 
class  -1.  This  three  points  are  considered  the  support  vectors. 

The  equation  for  the  separating  line  is 

WX  +  b  =  0  (eq  3.1) 


u>  is  the  perpendicular  vector  from  the  separating  line  and  b  is  the  offset. 

The  dividing  line  for  class  1  is  eq  3.2  where  x,  is  the  data  set  for  class  1 .  The  dividing  line  for  class  -1  is  eq 
3.3  where  x2  is  the  data  set  for  class  -1. 


w  •  +  b  -  1 

tv  •  x2  +  b  =  -1 


(eq  3.2) 
(eq  3.3) 


The  data  values  for  class  1  follow  the  inequality  of  eq  3.4  and  class  -1  follows  eq  3.4. 


W  X  +  b>\ 


(eq  3.4) 


w  x  +  b  <  —1 


(eq  3.5) 


Since  w  is  1  for  class  1  and  -1  for  class  -1,  eq  3.4  and  eq  3.5  can  be  combined  into  eq.  3.6 


(eq  3.6) 


From  eq  3.2  and  eq  3.3,  we  can  combine  them  and  normalize  to  get  eq  3.7 


(eq  3.7) 


The  optimal  dividing  line  will  be  the  line  that  maximizes  the  right  side  of  eq  3.7. 
want  to  minimize  equation  3.8  with  eq  3.6  as  one  of  the  constraints. 


Or,  in  other  words,  we 
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(eq  3.8) 


Optimization  is  done  using  LaGrange  multipliers  (a,),  minimizing  the  new  equation  with  respect  to  the 
primal  variables  w  and  b.  Call  this  equation  h(w,b,a),  or  L  for  short.  The  main  idea  is  to  find  the  saddle 
point  of  L.  The  KKT  (Karush,  Khan,  Tucker)  condition  states  that  the  saddle  point  appears  where  the 
primal  variables  vanish,  i.e. 


(eq  3.9) 


and 


=  0 


=  0 


(eq  3.10) 


So  the  solution  vectors  can  be  found  where  all  values  of  a;  do  not  equal  0.  These  are  the  support  vectors. 
Continuing  on,  we  can  substitute  eq  3.9  and  3.10  into  the  equation  for  L  and  eliminate  the  primal  variables 
(w>,  b).  We  are  left  with  the  Wolfe  dual  of  the  optimization 


w(a)=Tl=ia‘~ 


a&jy.yj 


(eq  3.11) 


subject  to  the  following  constraints: 


(eq  3.12) 


at>  0  for  i=  1,2,...,/ 
=0 


(eq  3.13) 


Finally,  the  decision  function  can  be  written  as: 


Ax )  =  sgn  (!>,«,•(-  x,)+b)  (eq  3.14) 


or, 


(eq  3.15) 


where  w  is  found  by  eq  3.10  and  b  is  computed  from: 


«/[v/((*/  •w/)+fe)-iJ=° 


(eq  3.16) 


3.3  Non-Linear  Separable  Case 

In  most  real  world  classification  problems,  the  data  does  not  follow  a  simple  linear  separation.  We  are 
often  forced  into  transforming  the  data  set  into  a  different  space  (called  feature  space)  to  accurately  portray 
all  the  features  before  any  learning  can  take  place.  This  is  very  computer  intensive  for  most  learning 
machines,  however,  support  vector  machines  do  not  need  to  transfer  all  the  data  into  the  feature  space. 

Our  goal  is  to  find  a  mapping  function  (called  <b)  from  the  real  space  (called  R")  to  the  feature  space  (called 
F).  i.e.: 


0:9?" - >F 

and  cut  down  the  time  it  takes  to  process.  If  we  look  at  the  equation  for  the  optimal  construction  of  the 
hyperplane  in  the  feature  space  (eq  3.1 1)  and  the  decision  equation  (eq.  3.14  or  eq.  3.15)  we  can  see  that 
only  dot  products,  i.e.  <&(jc)»<I>0’)>  need  to  be  manipulated.  We  don’t  have  to  transform  all  of  our  data,  only 
the  dot  products  of  the  data.  We  introduce  the  kernel  in  our  equations:  k(xj’)  =  <t>(x)»<f>(y). 

Our  decision  function  (eq.  3.14)  now  becomes 


Ax)  =  sgn  (I  ylal  •  k{x,xi  )+b)  (eq  3.17) 


and  the  Wolfe  dual  equation  (from  eq.  3.11)  becomes 


w(a)= X!=,a< -\Tl,,Ma>ajy-yAx”xj) 


(eq  3.18) 


subject  to  the  same  constraints  as  before. 


That’s  it!  No  transforming  of  data  to  feature  space,  only  the  evaluation  of  the  dot  products  in  feature 
space. 


3.4  What  About  a  Non-Separable  Case? 

The  support  vector  machines  can  still  perform  the  non-separable  case  with  relatively  good  accuracy  by 
relaxing  the  constraints.  To  do  this  we  use  a  slack  variable  (called  ^  >=  0  for  i  =  1,2,...,/).  Equation  3.6 
will  become 


T,((w*x,)+/>)>  1  (eq  3.19) 

The  end  result  is  the  same  as  the  non  separable  case  except  for  the  change  of  the  constraint  a,: 


0  <a,<C 

where  C  is  a  value  chosen  to  represent  a  limit  on  the  error.  In  the  separable  case,  we  assume  that  C  = 
infinity.  Different  values  of  C  will  change  the  separating  line.  This  can  be  graphically  shown  in  figure  3.3. 


The  circled  points  in  figure  3.3  are  the  support  vectors.  The  rest  is  self-explanatory. 


4.1  The  Classification  of  a  Car  in  Front  of  You  or  Not 

The  data  set  used  is  a  subset  of  a  set  from  TACOM.  In  this  set,  pictures  of  cars  were  taken  from  (from 
inside  the  vehicle)  in  different  areas  and  on  different  roads.  The  pictures  were  512x640  with  the  car  in  the 
center  (or  not). 

I  took  this  data  set  and  removed  all  cars  that  were  in  parking  lots  and  dirt  roads;  leaving  only  pictures  taken 
on  a  paved  road.  This  was  done  so  that  I  would  not  have  to  do  a  feature  extraction  of  different  textures  of 
roads  (dirt,  parking  lot,  paved  lane,  etc...).  The  problem  with  identifying  textures  would  be  a  very  complex 
problem  alone. 

The  data  set  was  categorized  into  having  a  car  in  front  or  not.  Cars  based  on  a  certain  distance  in  front  of 
the  camera  were  used  (estimated  visually).  Cars  further  out  were  categorized  as  a  “no  car  in  lane”.  The 
data  set  was  then  gray  scaled  and  cut;  I  took  the  center  256x256  image  (deleting  128  on  each  side  and  192 
on  the  top  and  bottom).  Finally,  the  standard  deviation  was  taken  of  each  row  and  put  into  a  vector.  This  is 
our  feature  vector  that  will  be  used  to  teach  (and  test)  the  support  vector  machine  -  it  is  a  vector  of  256 
elements. 

The  data  set  I  used  had  75  pictures  classified  with  a  car  in  front  and  54  pictures  with  no  car  in  front  (see 
figure  4.1  and  4.2  for  examples  of  the  data  set).  Notice  in  the  two  examples  the  different  contrast  in  the 
pictures.  Figure  4. 1  has  a  truck  in  the  center  on  a  highway,  but  the  trailer  of  the  truck  gives  off  a  bright 
spot.  Figure  4.2  shows  a  road  with  no  car,  but  the  picture  was  taken  in  a  shadow,  thus  part  of  the  road  is  of 
a  different  shade  then  the  rest.  The  idea  is  to  get  the  training  machine  to  recognize  these  differences  and 
disregard  them  if  need. 


Figure  4.1 


Figure  4.2 


4.2  The  Results 

For  the  kernel  I  used  a  polynomial  function  of  different  degrees  and  compared  them.  I  also  compared 
different  size  training  sets.  Any  data  I  didn’t  train  with  1  used  to  test,  so  if  the  training  set  had  25  with  a  car 
and  25  with  out  a  car,  then  the  test  set  would  be  50  with  a  car  and  29  with  out  a  car. 

A  classification  matrix  is  shown  using  the  rows  as  the  class  the  picture  belongs  to.  The  columns  represent 
what  the  image  was  classified  as.  Hence,  row  1,  column  1  represents  a  ‘no  car’  classified  correctly,  row  2, 
column  1  represents  a  ‘car’  classified  incorrectly. 


Polynomial  of  degree  2. 

Training  set:  Car  =  25,  No  Car  =  25. 
Testing  set:  Car  =  50,  No  Car  -  29. 


Classified 
as  Class  -1 

Classified 
as  Class  1 

Class  -1  (No  Car) 

21 

8 

Class  1  (Car) 

20 

30 

Classified  Correctly:  64.5% 


Polynomial  of  degree  3. 

Training  set:  Car  =  15,  No  Car  =  15. 
Testing  set:  Car  —  60,  No  Car  =  39. 


Classified 
as  Class  -1 

Classified 
as  Class  1 

Class  -1  (No  Car) 

21 

8 

Class  1  (Car) 

9 

41 

Classified  Correctly:  78.5% 


Polynomial  of  degree  4. 

Training  set:  Car  =  15,  No  Car  =15. 
Testing  set:  Car  =  60,  No  Car  =  39. 


Classified 
as  Class  -1 

Classified 
as  Class  1 

Class  -1  (No  Car) 

29 

10 

Class  1  (Car) 

13 

47 

Classified  Correctly:  76.8% 

Training  set:  Car  =  25,  No  Car  =  25. 
Testing  set:  Car  =  50,  No  Car  =  29. 

Classified 
as  Class  -1 

Classified 
as  Class  1 

Class  -1  (No  Car) 

20 

9 

Class  1  (Car) 

6 

44 

Classified  Correctly:  81.0% 

Training  set:  Car  =  35,  No  Car  =  35. 
Testing  set:  Car  =  40,  No  Car  =  19. 

Classified 
as  Class  -1 

Classified 
as  Class  1 

Class  -1  (No  Car) 

15 

4 

Class  1  (Car) 

5 

35 

Classified  Correctly:  84.7% 


As  you  can  see,  the  best  results  come  from  a  polynomial  kernel  of  degree  4  with  a  largest  training  set.  It 
classified  almost  85%  correct,  even  with  a  small  database  of  images  from  various  types. 


5.1  Final  Thoughts  on  the  Research  Project 

The  data  set  used  was  a  set  of  standard  deviations  from  each  row.  I  used  this  as  a  feature  because  I  thought 
that  the  deviation  of  the  gray  scaled  image  rows  would  be  a  good  measure  of  identifying  if  a  car  was 
present  or  not.  However,  when  converting  a  color  picture  to  a  gray  scale  image,  the  darkness  and  bright 
ness  scale  is  not  linear.  For  example,  if  the  gray  scale  range  were  from  255  to  1,  the  middle  brightness 
would  not  be  at  127.  It  would  be  around  200  (or  80, 1  forget  which  way  the  curve  is  bent).  If  it  were 
linear,  then  I  think  that  using  the  standard  deviations  as  the  feature  vector  would  give  better  results. 

Another  approach  might  be  to  try  to  extract  out  the  road  from  the  rest  of  the  data.  If  an  object  (matching 
the  rest  of  the  training  set)  were  blocking  the  road  then  there  would  be  a  car.  This  is  a  much  harder 
problem  then  I  originally  thought  it  would  be.  To  be  able  to  detect  the  road,  I  would  need  to  be  able  to 
classify  textures  or  perform  the  same  learning  as  if  there  was  a  car.  Classifying  textures  is  beyond  me  at 
this  time  (and  probably  will  always  be),  but  with  further  investigation  on  feature  spaces,  it  might  be  a 
solvable  problem. 

One  final  approach  I  thought  might  be  relatively  easy  would  be  to  use  a  box  in  a  box  approach.  Start  with 
the  center  of  the  picture  and  calculate  the  standard  deviation.  Then  make  a  larger  box  and  calculate  the 
standard  deviation  of  that.  Continue  until  you  end  up  with  a  box  that  encompasses  the  whole  area  of  the 
image.  Compare  the  standard  deviations  with  the  smallest  box  verse  the  next  larger  one.  Then  compare 
that  one  with  its  next  larger  box,  etc.  If  there  is  a  car  then  a  certain  size  box  and  all  the  smaller  ones  should 


have  relatively  the  same  standard  deviations  verse  the  out  side  boxes.  Unfortunately,  these  images  have  a 
lot  of  patches  of  darkness  were  their  should  be  light,  and  a  lot  of  patches  of  light  were  there  should  be 
darkness.  These  patches  (or  reversed  spots)  are  caused  by  many  things  (like  shadows  in  the  road,  light  cars 
with  dark  roofs,  reflection  from  wet  pavement,  a  patch  in  the  road,  etc.).  If  a  method  could  be  found  to 
outline  the  car  and  use  that  to  compare  a  relative  standard  deviation  (taking  out  reverse  spots)  then  this 
might  be  a  feasible  alternative. 


7.1  Optimization  of  the  Support  Vector  Machine 

At  the  start  of  this  project,  I  had  intended  on  coming  up  with  a  method  for  generalizing  an  optimization 
method.  With  an  unusual  amount  of  time  spent  on  learning  the  theory  behind  support  vector  machines  and 
a  problem  scheduling  time  with  people  to  talk  too  about  possible  optimization  categories  to  research,  I  have 
not  been  very  successful.  I  have,  however,  come  up  with  a  few  nuggets  of  information  for  the  ambitious  to 
pursue.  Optimization  of  the  SVMs  can  be  accomplished  by  restricting  the  kernels  to  a  low  VC  dimension. 

A  low  VC  dimension  will  reduce  the  risk  and,  hopefully,  increase  the  success  of  the  kernel.  Unfortunately, 
there  are  an  uncountable  infinite  (maybe  its  countable)  number  of  possible  functions  in  each  space.  And  a 
small  VC  dimension  does  not  guarantee  a  good  kernel  (as  shown  in  the  car  classification  problem).  1 
believe  that  optimization  will  have  to  be  found  in  other  methods.  Feature  extraction  would  be  a  good  place 
to  start. 

8.1  Final  Words  on  Support  Vector  Machines  and  this  Project 

Support  vector  machines  are  good  classifiers  for  small  amounts  of  data  with  out  a  distribution,  many  times, 
features  can  be  less  descriptive  (thus,  less  computer  intensive  to  generate  the  feature  vector)  then  many 
other  classification  methods.  This  is  the  result  of  the  learning  process  in  the  support  vector  machines. 

This  project  showed  that  it  is  possible  to  get  reasonable  classification  results  from  a  diverse  population.  To 
increase  the  results  to  something  that  might  be  useable  would  take  either  a  different  data  collection  system 
(like  using  radar  or  sound  rather  than  pictures)  or  a  more  complex  feature  extractor.  Wavelets  would  be  a 
good  start  as  would  texture  detection. 
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Appendix  A 


The  following  code  was  used  to  convert  the  images  into  the  specified  data  set  so  that  the  processing  of 
support  vector  machines  could  be  done. 

%  convert_w_std.m  program 

%  program  to  convert  images  std  to  a  vector  for  SVM  processing 
%  by  Mike  Del  Rose 

clear ; 

%  assign  values 

NumOf Imag  car  =  75;  %  number  of  images  with  a  car 

NumOf Imag_nocar  =54;  %  number  of  images  without  a  car 

shaveSide~=  192;  %  the  number  of  points  to  shave  off  on  each  side 
shaveTopBot  =  128;  %  the  number  of  points  to  shave  off  the  top  and  bottom. 
newImageSize  =  [16  16!;  %  reduce  by  a  factor  of  16  (was  256x256) 
imageFmt  =  'jpg'; 

%  ***************  car  images  for  training  *************** 

stdCarMatrix  =  []; 

for  i  =  1 ; 1 : NumOf Imag_car 

imageName  =  ['car'  num2str(i)  '.jpg'); 

%  read  image  -  imageX  =  image,  mapX  =  the  color  mapping 
imageX  =  imread (imageName, imageFmt) ; 

%  convert  to  gray  scale 

graylmageX  =  rgb2gray (imageX) ; 

%  shave  off  the  parts  not  needed 
sizelmageX  =  size (graylmageX) ; 

shavedlmageX  =  graylmageX (shaveTopBot+1 : sizelmageX ( 1 ) -shaveTopBot, 
shave Side+1 : sizelmageX (2)-shaveSide) ; 

%  make  the  features  the  std  of  each  row 
doublelmageX  =  double ( shavedlmageX) ; 
f inallmageX  =  std (doublelmageX' ) ; 

%  store  all  vectors  in  1  large  matrix 

stdCarMatrix  =  [stdCarMatrix;  finallmageX); 
end; 

%  ***************  no  car  images  for  training  *************** 

stdNoCarMatrix  =  []; 

for  i  =  1 : 1 -.NumOf  Imag_nocar 

imageName  =  ['nocar'  num2str(i)  '.jpg'); 

%  read  image  -  imageX  =  image,  mapX  =  the  color  mapping 
imageX  =  imread (imageName, imageFmt) ; 

%  convert  to  gray  scale 

graylmageX  =  rgb2gray (imageX) ; 

%  shave  off  the  parts  not  needed 
sizelmageX  =  size (graylmageX) ; 

shavedlmageX  =  graylmageX ( shaveTopBot+1 : sizelmageX ( 1 ) -shaveTopBot , 
shaveSide+1 : sizelmageX (2) -shaveSide) ; 

%  resize  to  make  a  more  useable  matrix  using  nearest  neighbor  method 
doublelmageX  =  double (shavedlmageX) ; 
finallmageX  =  std (doublelmageX ’) ; 

%  store  all  vectors  in  1  large  matrix 

stdNoCarMatrix  =  [stdNoCarMatrix;  finallmageX]; 
end; 

%  save  final  matrices 

save  ' std_car_matrix .mat '  stdCarMatrix; 
save  'std  nocar  matrix. mat'  stdNoCarMatrix; 


Appendix  B 


The  following  code  was  used  to  take  in  a  data  set  and  use  the  support  vector  machines  to  train  and  classify 
the  data. 


%  This  program  is  used  to  test  out  ideas  in  SVM. 

%  SVM  functions  used  will  be  svctrain,  and  svcplotc. 

%  Mike  Del  Rose 
% 

clear  all; 

%  Initialize  defined  variables: 

ON  =  1; 

OFF  =  0; 

%carMatrixName  =  [ ' Final_car_matrix .mat ' ] ; 

%noCarMatrixName  =  [ ' Final_nocar_matrix .mat ' ] ; 
carMatrixName  =  [ ' std_car_matrix .mat 1 ] ; 
noCarMatrixName  =  [ ' std_nocar_matrix .mat ' ] ; 

NumOf TrainCar  =  25; 

NumOfTrainNoCar  =  25; 

%  set  debug  flags 
DEBUG_DATA  =  OFF; 

DEBUG_VAR  =  OFF; 

DEBUG_PRINT_DATA  =  OFF; 

DEBUG  =  OFF; 

%  Set  Normalization  to  on  or  off  (mean  =  0,  sigma  =  1) 

NORMALIZE  =  ON; 

%  set  up  data  set  by  reading  in  data, 
cd  .  .  /project 

%  eval ( [ ’ load  '  carMatrixName  '  f inalCarMatrix ' ] ) ; 

%  eval (['load  '  noCarMatrixName  '  finalNoCarMatrix ' ] )  ; 
eval ( [ 1  load  '  carMatrixName  '  stdCarMatrix']); 
eval (['load  '  noCarMatrixName  '  stdNoCarMatrix']); 

%  Xl_Final  =  double ( f inalCarMatrix ) ;  %  make  sure  it  is  double  format  (for  STD) 

%  X2~Final  =  double ( f inalNoCarMatrix) ;  %  make  sure  it  is  double  format  (for  STD) 
Xl_Final  =  double (stdCarMatrix) ;  %  make  sure  it  is  double  format  (for  STD) 
X2~Final  =  double (StdNoCarMatrix) ;  %  make  sure  it  is  double  format  (for  STD) 

Yl_Final  =  ones (size (Xl_Final, 1) , 1) ; 

Y2_Final  =  -ones (size (X2_Final, 1) , 1) ; 

cd  . . /svm_code 

%  Normalize  data  (if  set  to  ON) 
if  (NORMALIZE) 

stdXl  =  std (Xl_Final ' ) ; 
stdX2  =  std (X2_Final 1 ) ; 
meanXl  =  mean (Xl_Final ' ) ; 
meanX2  =  mean (X2_Final 1 ) ; 
sizeXl  =  size (Xl_Final ) ; 
sizeX2  =  size (X2_Final) ; 
for  i  =  1 : 1 : sizeXl (2) 

Xl_Norm(:,i)  =  Xl_Final ( : , i)  -  meanXl'; 

Xl_Norml ( : , i)  =  Xl_Norm ( : , i)  ./  stdXl'; 
end; 

for  i  =  1 : 1 : sizeX2 (2) 

X2_Norm ( : , i )  =  X2_Final ( : , i )  -  meanX2'; 

X2_Norml ( : , i)  =  X2_Norm(:,i)  ./  stdX2'; 
end; 

%  reset  the  current  values  of  the  data 
clear  Xl_Final  X2_Final; 

Xl_Final  =  Xl_Norml; 

X2  Final  =  X2  Norml; 


end; 


if  (DEBUG  DATA)  Xl_Final,  X2_Final,  Yl_Final,  Y2_Final,  end 

%  seperate  training  and  testing  data 
sizeXl  =  size (Xl_Final) ; 
sizeX2  =  size (X2_Final) ; 

Xl_Train  =  Xl_Final (1 :NumOfTrainCar, : ) ; 

Y1  Train  =  Yl_Final (1 :NumOfTrainCar) ; 

X2~~ Train  =  X2_Final ( 1 :NumOfTrainNoCar,  ; )  ; 

Y2~Train  =  Y2_Final  (1  -.NumOfTrainNoCar)  ; 

XI  Test  =  Xl_Final (NumOf TrainCar+1 : sizeXl ( 1) ,  :  )  ; 

Yl~Test  =  Y1  Final (NumOfTrainCar+1 : sizeXl ( 1) ) ; 

X2_Test  =  X2  Final (NumOfTrainNoCar+1 :sizeX2 (1) ,:) ; 

Y2_ Test  =  Y2  Final (NumOfTrainNoCar+1 :sizeX2(l)); 


%  Combine  classes  together  to  make  training  and  testing  set 
X  =  [Xl_Train;  X2_Train] ; 

Y  =  [Yl_Train;  Y2_Train] ; 

X_Test  =  [Xl_Test;  X2_Test) ; 

Y_Test  =  [Yl_Test;  Y2_Test); 

if  (DEBUG_DATA)  X,  Y,  X_Test,  Y_Test,  end; 

if  (DEBUG_PRINT_DATA) 

xmin  =  -5;  xmax  =  12;  ymin  =  -2;  ymax  =  16; 
axis  normal 

axis ((xmin,  xmax,  ymin,  ymax)) 
title ( 1  Data  plot ' ) 
hold  on 

plot (XI  Final ( : , 1 ) , Xl_Final ( : , 2) , 1 bx ' , X2_Final ( : , 1) ,X2_Final ( : , 2 ) ,  r 

X_axis  =  [xmin: .l:xmax] ' ; 

Y_axis  =  [ymin: . 1 : ymax] ' ; 
plot  (X_axis,  0,  1  k — '  ,  0,  Y_axis,  '  k—  '  > 
else  %  continue  normal  execution 
%  initialize  variables 
ker  =  'poly' ; 

C  =  inf;  %  upper  bound  for  non-seperable  case 
xOO  =0;  %  starting  value 

pi  =  3;  %  parameter  1  of  the  kernel  function 

p2  =  0;  %  parameter  2  of  the  kernel  function 

aspect  =  0;  %  Aspect  ratio  (0  =  fixed;  1  =  variable) 

mag  =1;  %  magnification  of  the  display 

xaxis  =1;  %  xaxis  input  column  X  (default  =  1) 

yaxis  =  2;  %  yaxis  input  column  X  (default  =  2) 

if  (DEBUG) 


X,  xaxis 


fprintf  ( 'min (X (:,  xaxis) ) '  ), 

min (X ( : 

, xaxis) ) 

fprintf  ( 'max (X (:, xaxis) ) '  ), 

max (X ( : 

, xaxis) ) 

yaxis 

fprintf  ( 'min(X(:  ,  yaxis)  )  '  ), 

min (X ( : 

,  yaxis)  ) 

fprintf  ( 'max (X  < :,  yaxis) )  '  ), 

max (X ( : 

,  yaxis)  ) 

end 

param  =  [C, xOO, pi, p2] ; 

params  =  [aspect , mag, xaxis, yaxis 

,pl,p2] ; 

if  ( DEBUG_VAR)  ker,  param,  params,  end 


%  train  data  set 

[nsv,  alpha,  bias,  svi]  -  svctrain (X, Y, ker , param) ; 
if  (DEBUG)  nsv,  alpha,  bias,  param,  end 

%Plot  the  results 

s vcplotc (X, Y, ker, alpha, bias, params) 
end  %  if  not  in  DEBUG_PRINT_DATA  mode 


%  test  of  the  data  set. 
trainSet  =  X; 
trainY  =  Y; 
testSet  =  X_Test; 
p  =  [pi  p2 ] ; 

finalTest  =  svctest(ker,  alpha,  bias,  trainSet,  testSet,  trainY,  p) ; 


%  Compute  classification 
classMatrix  =  [0  0;0  0]; 
testVector  =  sign ( finalTest ) ; 
sizeTV  =  size (testVector) ; 
for  i  =  1 : 1 : sizeTV (1 ) 

if  testVector (i )  ==  Y_Test(i) 
if  Y_Test (i)  ==  1 

classMatrix(2,2)  =  classMatrix (2, 2) 
else 

classMatrix (1, 1)  =  classMatrix ( 1, 1 ) 


end; 

else 

if  Y_Test (i)  ==  1 

classMatrix (2, 1)  =  classMatrix (2, 1 ) 
else 

classMatrix (1, 2)  =  classMatrix (1, 2 ) 


end 

end; 

end; 


+  1; 
+  1; 


+  1; 
+  1; 


%  classified  class  1  correctly 
%  classified  class  -1  correctly 

%  classified  class  1  incorrectly 
%  classified  class  -1  incorrectly 


classMatrix 

output 1  =  ['for  class  1,  '  num2str (classMatrix (2 , 2 ) )  1  were  classified  correctly. 

num2str (classMatrix (2, 1) )  '  were  classified  incorrectly.']; 

output2  =  ['for  class  -1,  1  num2str (classMatrix (1,  1) )  1  were  classified  correctly. 

num2str (classMatrix (1, 2) )  '  were  classified  incorrectly.']; 

outputl,  output2 


